Structural effects of network sampling coverage I: Nodes missing at random

نویسندگان

  • Jeffrey A. Smith
  • James Moody
چکیده

Network measures assume a census of a well-bounded population. This level of coverage is rarely achieved in practice, however, and we have only limited information on the robustness of network measures to incomplete coverage. This paper examines the effect of node-level missingness on 4 classes of network measures: centrality, centralization, topology and homophily across a diverse sample of 12 empirical networks. We use a Monte Carlo simulation process to generate data with known levels of missingness and compare the resulting network scores to their known starting values. As with past studies (Borgatti et al 2006; Kossinets 2006), we find that measurement bias generally increases with more missing data. The exact rate and nature of this increase, however, varies systematically across network measures. For example, betweenness and Bonacich centralization are quite sensitive to missing data while closeness and in-degree are robust. Similarly, while the tau statistic and distance are difficult to capture with missing data, transitivity shows little bias even with very high levels of missingness. The results are also clearly dependent on the features of the network. Larger, more centralized networks are generally more robust to missing data, but this is especially true for centrality and centralization measures. More cohesive networks are robust to missing data when measuring topological features but not when measuring centralization. Overall, the results suggest that missing data may have quite large or quite small effects on network measurement, depending on the type of network and the question being posed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sampling from social networks’s graph based on topological properties and bee colony algorithm

In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...

متن کامل

Missing data in networks: exponential random graph (p∗) models for networks with non-respondents

Survey studies of complete social networks often involve non-respondents, whereby certain people within the “boundary” of a network do not complete a sociometric questionnaire—either by their own choice or by the design of the study—yet are still nominated by other respondents as network partners. We develop exponential random graph (p∗) models for network data with non-respondents. We model re...

متن کامل

Missing Data as Part of the Social Behavior in Real-World Financial Complex Systems

Many real-world networks are known to exhibit facts that counter our knowledge prescribed by the theories on network creation and communication patterns. A common prerequisite in network analysis is that information on nodes and links will be complete because network topologies are extremely sensitive to missing information of this kind. Therefore, many real-world networks that fail to meet thi...

متن کامل

Unauthenticated event detection in wireless sensor networks using sensors co-coverage

Wireless Sensor Networks (WSNs) offer inherent packet redundancy since each point within the network area is covered by more than one sensor node. This phenomenon, which is known as sensors co-coverage, is used in this paper to detect unauthenticated events. Unauthenticated event broadcasting in a WSN imposes network congestion, worsens the packet loss rate, and increases the network energy con...

متن کامل

Performance of random walks in one-hop replication networks

Random walks are gaining much attention from the networks research community. They are the basis of many proposals aimed to solve a variety of network-related problems such as resource location, network construction, nodes sampling, etc. This interest on random walks is justified by their inherent properties. They are very simple to implement as nodes only require local information to take rout...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Social networks

دوره 35 4  شماره 

صفحات  -

تاریخ انتشار 2013